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ASSESSING EXTREMA OF EMPIRICAL PRINCIPAL 
COMPONENT FUNCTIONS 

By Peter Hall and Celine Vial 

Australian National University 

The difficulties of estimating and representing the distributions 
of functional data mean that principal component methods play a 
substantially greater role in functional data analysis than in more 
conventional finite-dimensional settings. Local maxima and minima 
in principal component functions are of direct importance; they in- 
dicate places in the domain of a random function where influence 
on the function value tends to be relatively strong but of opposite 
sign. We explore statistical properties of the relationship between 
extrema of empirical principal component functions, and their coun- 
terparts for the true principal component functions. It is shown that 
empirical principal component funcions have relatively little trouble 
capturing conventional extrema, but can experience difficulty distin- 
guishing a "shoulder" in a curve from a small bump. For example, 
when the true principal component function has a shoulder, the prob- 
ability that the empirical principal component function has instead 
a bump is approximately equal to |. We suggest and describe the 
performance of bootstrap methods for assessing the strength of ex- 
trema. It is shown that the subsample bootstrap is more effective 
than the standard bootstrap in this regard. A "bootstrap likelihood" 
is proposed for measuring extremum strength. Exploratory numerical 
methods are suggested. 

1. Introduction. The inherent complexity of functional data analysis, as 
a distinctly infinite-dimensional and infinite-parameter (or nonparametric) 
problem, means that principal-component methods assume greater impor- 
tance in FDA than in more traditional, finite-dimensional settings. In partic- 
ular, there is often no practical opportunity for estimating, in a meaningful 
way, the "distribution" of a random function. Both the representation of 
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such a distribution, and the slow convergence rates of estimators, throw up 
obstacles which seem insurmountable in many cases. 

Considerations of this type argue that properties of the principal com- 
ponent functions in the distribution of a random function are often going 
to be of greater importance than properties of the distribution itself. For 
example, it will be of greater interest to assess peaks and troughs in a prin- 
cipal component function, than to look for extrema in the "density" of the 
distribution. 

The principal component functions appear explicitly in the Karhunen- 
Loeve representation, or expansion, of X (see Section 2.2), where they are 
weighted by a conventional, scalar random variable. Each part of the func- 
tion receives the same weight, in terms of the way it contributes to the 
distribution of X . Therefore, places where a principal component function 
is relatively high, versus places where it is relatively low, have direct inter- 
pretation. In particular, local maxima and minima in principal component 
functions are of explicit importance; they point to places in the domain of 
a random function where the influence on the function value tends to be 
relatively strong but of opposite signs. Therefore, identifying extrema in 
principal component functions, from evidence furnished by their empirical 
counterparts, is an important part of principal component analysis in FDA. 

In some respects this problem is not unlike its counterpart in more classi- 
cal nonparametric function estimation, where a great deal of effort is often 
directed toward assessing the numbers of modes and local minima. See, for 
example, the literature on mode testing and assessment (e.g., [[9, 10, 11, 17, 
21, 27, 29, 30, 34], [41, 43]]). For discussion of the mode in nonparametric 
regression, see, for example, [42, 47, 48]. However, in important respects the 
two problems are very different. This is reflected in the fact that empiri- 
cal principal component functions are more accurate estimators of the true 
principal component functions than conventional nonparametric function es- 
timators are of the true functions. (In particular, they are root-n consistent.) 
As a result, extrema of empirical principal component functions are more 
inclined to be close to the correct position than in the case of nonparametric 
curve estimators. 

Moreover, empirical principal component functions are less likely to ex- 
hibit spurious "wiggles" in the neighborhood of a real extremum of a true 
principal component function. This property holds true quite generally, even 
if the extremum is approached in the manner of a high-degree polynomial. 
(The extremum of the function with equation y = x 2p , for a large posi- 
tive integer p, provides an example.) In contrast to these properties, how- 
ever, empirical principal component functions have considerable difficulty 
distinguishing a "shoulder" in the true principal component function, from 
a small "bump" there. (We say that a point xq is a shoulder point [resp., 
a bump point] of a continuously differentiable function / if f'(xo) = and 
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f'(xo + x) is either strictly positive, or strictly negative [resp., if f'(xo) = 
and /'(xq + x) f'(xo — x) < 0] for all x / in a neighborhood of 0. In particu- 
lar, the origin is a shoulder point of the function x 2p+1 , and a bump point of 
the function x 2p .) In such cases the probability that the empirical principal 
component function also has a shoulder, and the chance that it has instead 
a single bump, both converge to ^. 

In practice, random functions are almost invariably recorded on a discrete 
grid. (The only exceptions of which we are aware occur in a small number 
of problems where the functions are recorded by analog means, and even 
there the data are discretised prior to analysis.) In some contexts the grid is 
extremely fine; one such example arises in the increasingly common problem 
of near-infrared spectroscopy, where X(t) denotes the transmission at wave- 
length t, and several thousand values of t are treated at regularly spaced 
points in an interval. At another extreme, the points at which X is recorded 
may be rather sparse. For example, in economic data there may be only a 
dozen values of X, representing monthly observations of a process that, in 
theory at least, operates in the continuum. Particularly in cases such as this 
the data are smoothed, often by spline methods, prior to obtaining the ver- 
sions of X to which statistical methodology is applied. Therefore the data 
may plausibly be supposed to be in the continuum, even if their origin is 
discrete. In this paper we work with continuous functions, rather than with 
the discrete information from which those functions are derived. 

Bootstrap methods can be used to determine the strength of an extremum, 
or to assess the possibility that there should be an extremum in a neigh- 
borhood of a point where the empirical principal component function has 
a shoulder. However, it is necessary to employ the subsample bootstrap 
where the resample size is of smaller order than the sample size (see, e.g., 
[33]). Using the same size of resample is not as effective. We shall estab- 
lish all these properties, and suggest a "bootstrap likelihood" for assessing 
extremum strength. Additionally, we shall develop exploratory numerical 
methods for addressing the prevalence of extrema and shoulders. 

Early development of methodology and theory for principal component 
analysis of functional data included work of Rao [38], and especially Dauxois, 
Pousse and Romain [12], who described asymptotic properties of eigenvalues 
and eigenvectors of sample covariance functions. See also [1, 35, 39, 44, 45]. 
The technology of FDA has been surveyed and described by Ramsay and 
Silverman ([37], Chapter 6). There, and more particularly in work of Ramsay 
and Silverman [36], functional principal component analysis is illustrated by 
application to real-data examples. Recent work includes that of Cardot [6], 
Cardot, Ferraty and Sarda [7, 8], Girard [18], James, Hastie and Sugar [26], 
Boente and Fraiman [4], Huang, Wu and Zhou [23], Mas [28] and He, Miiller 
and Wang [22]. 
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Articles where eigenfunctions play important roles in discrimination and 
related problems for functional data include those of Huang [24] , who used 
principal components analysis in FDA for gait recognition, and Ferraty and 
Vieu [15, 16], Glendinning and Herbert [20], Glendinning and Fleet [19] and 
Biau, Bunea and Wegkamp [3], who employed principal component functions 
in different ways for classification. Of course, the bootstrap has been used 
widely in the context of functional data analysis; see, for example, [13, 14, 
26, 31, 32, 40, 46]. 

2. Properties of extrema of empirical principal component functions. 

2.1. The case of a fixed distribution of random functions. First we define 
functional principal component functions. Let X\,X2,... be random func- 
tions on a compact interval 2, and let X denote a generic Xi, with mean 
H = E(X). Put X = n~ 1 J2i x i> 

oo 

(2.1) K(u, v) = E[{X(u) - (i(u)}{X(v) - fi(v)}] = ^ 

5=1 

-, n oo 

(2.2) K(u,v) = - J2iMu) ~ X{u)}{Xi(y) - X(v)} = £ 0^ 3 {u)^j{v). 

n i=l j=l 

The function K is the covariance function of the random process X. It 
can also be interpreted as the kernel of the linear operator that takes a 
function ip to JjK(-,u)ip(v)du. Hence the notation K, for kernel, which is 
common in this setting. The function sequences ipi, ip2, ■ ■ • and ipx,ip2i--- each 
comprise an orthonormal basis for the space of square-integrable functions 
on X. They represent the sequence of "true" principal component functions, 
and the sequence of empirical principal component functions, respectively. 
Each pair (6j,ipj) in (2.1) represents an (eigenvalue, eigenvector) pair for 
the linear operator with kernel K. 

The validity of (2.1) and (2.2) follows from standard results in analysis; 
see, for example, [25], Chapter 4. The existence of the infinite expansions 
there is sometimes referred to as Mercer's theorem, although that name is 
occasionally used for other results. See, for example, [5], Chapter 1. Since 
K, in (2.2), is almost surely a finite rank operator, then the spectral de- 
composition there is in fact truncated, in the sense that 6j vanishes for all 
sufficiently large j. 

The positive-definiteness property of a covariance function implies that 
each 6j is nonnegative. Therefore, without loss of generality the eigenvalues 
are ordered so that 9\ > 02 > • • • > 0. Likewise, we may assume that 6\ > 
9 2 > ■ ■ ■ > 0. If 0i, . . . ,0j + \ are distinct then the functions ipi,...,ipj are 
uniquely defined by (2.1), except that their signs may be reversed. In order 



PRINCIPAL COMPONENT FUNCTIONS 



5 



to match the sign of ipk to that of ipk we shall suppose that f x ip k ipk > 
for each 1 < k < j. Using this convention and assuming that i?(||X|| 2 ) < 
oo, it is readily proved that \\ipk — ipk\\ ~ > in probability, and hence that 
J x ipkipk — > 1 in probability as n — > oo; see, for example, [5], Chapter 4. [Given 
a continuous, square-integrable function tp on X, we let H^ll 2 = JjV' 2 an d 
Halloo = sup u g X |^(n)|.] 

The following assumption asks that the ipks have only finite numbers of 
extrema and horizontal points of inflection (or shoulders), and in particular 
do not vanish identically on nondegenerate subintervals of I: 

For 1 < k < j, ipk has a Holder- continuous derivative onX, van- 
ishing at at most a finite number of points Uk\ < ■ ■ ■ < Uk qk , all of 
(2.3) which are interior points of I; and, for 1 < k < j and 1 < I < q k , 
the function K{uki,-) has two square-integrable derivatives on 
1. 

[If qk = then (2.3) implies that \ip' k \ is bounded away from zero on 
We insist that each u^i be an interior point, since results for points on 
the boundary have to be framed a little differently. For example, if a local 
maximum of ^ occurs at a point uq which is an endpoint of X, then the 
probability that ipk has a local maximum in a neighborhood (within X) of 
that point does not converge to 1. However, this is not the case if uq is in 
the interior of X. 

The next condition asks that ip^ behave like a polynomial near its extrema 
and shoulders, and that it have two smooth derivatives: 

For all 1 < k < j and all 1 < £ < qk, ipk has two Holder- 
continuous derivatives in a neighborhood of u^, and 

, . (d/du) r {ip k (u) - ip k (u k e)} 

= (d/du) r A ke (u - u kl ) rkl + 0(\u - u M \ r ^- r ) 

as u — ► u k t, where A k i ^ 0, r = 0, 1 or 2, r k t > 2 is an integer, 
and T) > 0. 

If r k £ is even then, in view of (2.4), u k £ gives a local maximum or local 
minimum of tp k according as A k £ < or A k £ > 0, respectively. If r^e is odd 
then um is a shoulder-point of ipk- 

A reviewer has reported that in some real-data problems the first eigen- 
function is identically constant, or very nearly so. While we have not en- 
countered this ourselves, assumptions such as (2.4) would obviously not be 
appropriate in such cases. 

In Theorem 2.1 below we shall measure the smoothness of the rth deriva- 
tive of X in terms of the finiteness of the moment-based Lipschitz criterion 

(2.5) j r (D,7]) = e{ sup \X( r \u)-X( r \v)\ D \u-v\- D A, 
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where D, r] > 0. 

Write I = [a, b] , where — oo < a < b < oo. In the statement of Theorem 2.1 
below, let e > be any positive number not exceeding half the minimum 
value, over 1 < k < j and 1 < I < q^ + 1, of u^i — Uk^-i, where u^o = a and 
Uk,q k +i = b. An "77-neighborhood" of a point u£l denotes the set of all 
real numbers that are distant no more than 77 from u. In addition to (2.5) 
we shall impose conditions which imply that E\X(u)\ D < 00 for each 
This in turn entails -EUA^- < 00. 

The main aspects of Theorem 2.1 are encapsulated in the following state- 
ment: "With probability converging to 1, the extrema of each empirical prin- 
cipal component function, tp^, correspond exactly to those of the respective 
true principal component function, tpk, except that in the neighborhood of a 
shoulder of ipk there may, with probability i, be exactly two additional ex- 
trema. The latter are spurious extrema, in the sense that they arise through 
stochastic fluctuations and do not reflect actual extrema of 

Theorem 2.1. Assume the eigenvalues 8\,...,8j + \ are distinct, and 
that with probability 1, X has a Holder- continuous second derivative on I, 
with Holder exponent 77 > 0. Suppose too that (2.3) and (2.4) hold. Let e be 
as in the previous paragraph, and suppose < 00 for s = 0, 1, 2, 

and 72(^,77) < 00, for a sufficiently large value of D > 0. Then, with proba- 
bility converging to 1 asn-xx), and for alll < k < j and 1 < £ < q^, (a) -(c) 
below hold: (a) Within each interval [uk£ + e,v,k,e+i — e], for 1 < k < j and 
1 < i < qu ~ 1) an d within each interval [a,Uki — s] and [uk,q k — £,b], for 
1 <k <j, the equation ip' k = has no solution, (b) In each e -neighborhood 
of a point um f or which r^ is even, ^ has exactly one local maximum, or 
exactly one local minimum, according as A^e < or A^e > 0, respectively, 
and has no horizontal points of inflection, (c) In each e -neighborhood of a 
point Uk£ for which r^i is odd, ^fc has no more than two extrema. Further- 
more: (d) When r^i is odd, the probability that tp^ has no extremum, and the 
probability that there are exactly two extrema (a local maximum or a local 
minimum, resp.) in an e -neighborhood of Uke, both converge to \. (e) If r^ 
is even, then the extremum Uke, say, of 4>k * n a neighborhood ofuki [see (b)], 
satisfies 

„V{2(r M -l)}( 6w _ Uke) ^ N l/(r ke -l) 

in distribution, where N has a normal N(0, o~i?) distribution and > 0. 
(f) If r^e is odd, then conditional on ipk having two extrema, u£( > uZe say, 
in a neighborhood of uu [see (d)], they satisfy 

n l/{2(r M -D}( fi + _ UHjii + _ Um) ^ {\N\ X I^- X \-\N\ X '^-^) 

in distribution, where N is as in (e). 
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2.2. The case of locally perturbed distributions of random functions. Re- 
sults similar to Theorem 2.1 can be obtained in the case of slight alterations 
to a population, thereby giving insight into the way in which an empirical 
principal component function ipj responds to small bumps in the true princi- 
pal component function tpj . For the sake of brevity we shall only summarize 
the results below, rather than state them as formal theorems. We shall refer 
to the small alterations as "local perturbations," where the word "local" is 
used in the sense of the standard terminology "local hypothesis," not in the 
sense in which "local" is interpreted in statistical smoothing. 

In general, the effect of adding a perturbation does not alter the results 
described in Section 2.1, if the perturbation is of smaller order than the size, 
n^ 1 / 4 , of noise. On the other hand, the effect of noise on the perturbation 
is negligible if n -1 / 4 is of smaller order than the size of the perturbation. 
We shall discuss these properties in detail in a particular case, where the 
added perturbation is one of the orthogonal functions themselves. There the 
theory is particularly simple and transparent. 

Let ipj(u) = A(u — no) 3 , where uq denotes the midpoint of X, and A^O. 
Thus, uq is the site of a shoulder of ipj. We shall add a small bump, of 
vertical height S, at uq- Let ip p , for p^ j, be symmetric about uo, have two 
continuous derivatives on X, and satisfy ip p > and ip ! ' < on X, together 
with i/j' p (uq) = 0. The restrictions on tpj and i/j p ensure that these functions 
are orthogonal; they are of course easily rendered orthonormal by rescaling. 
(Note that we are adding a bump function, ip p , to a shoulder function, tpj, 
and in particular are not adding a shoulder function to a shoulder function.) 

Note particularly that under the above conditions, and for all sufficiently 
small 1 5 1 > 0, the function 

(2.6) fa = (1 - 5 2 ) 1/2 Vj + Hp 

has exactly two extrema, one of them at uq and the other distant 0(|<5|) from 
uo, giving either a local maximum or a local minimum. We shall explore the 
extent to which the resulting bump in the local perturbation, <j)j, of tpj is 
visible in the empirical principal component function <j)j. 

A random function X whose covariance admits the spectral expansion 
(2.1) also has the expansion 

oo 

(2.7) X(u) - E{X{u)} = £ ^ fc («), 

k=l 

where the random variables = f(X — EX)ipk are uncorrelated and have 
zero mean, with E(£%) = 8^. Result (2.7) is sometimes referred to as the 
Karhunen-Loeve expansion of X — E(X), although that name is also used 
for results such as the expansion at (2.1). 
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We shall take E(X) = for simplicity, and perturb X to Y, where 

oo 

Y ( u ) = ^£,k<t)k{u), 
k=l 

with 4> k = ij) k for k {j,p}, as at (2.6), <f> p = 6if}j-(l-8 2 ) 1 / 2 tp p , < \6\ < 1 
and £k exactly as at (2.7). The corresponding spectral decomposition of the 
covariance of Y is 

oo 

L(u, v) = cov{Y(u)Y(v)} = OkMu)Mv) 

fc=i 

= K(u, v) + 6(6 j - e p ){^(u)%{v) + %{u)^{v)} + 0(6 2 ) 

as 5 — > 0, where K(u,v) and 9i,02, ■ ■ ■ are as at (2.1). 

Let the perturbed random functions Y\,. .. ,Y n be independent and iden- 
tically distributed as Y, and let 5 = 8(n) depend on n, converging to zero as 
n — ► oo. Taking Y = n~ l J2i an< ^ 

1 n 

(2.8) L(u,v) = - ]T{Y 4 («) - Y(u)}{Yi(v) - Y(v)}, 

being the analogue of K defined at (2.2) but for the random functions Yi 
rather than X{, we may obtain a spectral decomposition in the usual way: 

oo 

(2.9) L(u,v) = Y,hMu)Mv), 

fc=i 

where 6 k and (frk may be interpreted as estimators of 9^ and (f>k, respec- 
tively. The version of Theorem 2.1 pertaining to the data Yi, rather than 
Xi, includes, among other things, the following properties. 

Theorem 2.2. Assume the conditions of Theorem 2.1. If e > is not 
greater than half the length of I, and 6 = 6(n) satisfies n 1//4 5 — > oo, then with 
probability tending to 1 as oo, <j)j has just two extrema, and these occur 
in an e -neighborhood of uq; and if 5(n) =o(n _1//4 ), then the probability that 

has just two extrema in the e -neighborhood, and the chance that it has no 
extrema there, both converge to |. When 5(n) = const.n -1 / 4 , the probability 
that there are just two extrema in the e -neighborhood converges to a number 
strictly between and 1. For any of these choices of S, the probability that 
<pj has a horizontal point of inflection in I equals 0. 

Recall that </>,• has exactly two extrema, for all sufficiently small values 
of 1 6 1 . Theorem 2.2 implies that, if |<5| is of strictly larger size than n -1 / 4 , then 
with probability tending to 1 as n — > oo, the empirical principal component 
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function (pj has exactly this number of extrema, and so correctly indicates 
the presence of the bump in <pj. However, if |<5| is of smaller size than n" 1 / 4 
then the probability that the added bump is reflected in cftj converges only 
to \. Hence, in this case the bump is not so readily visible. In this sense the 
"bump" in <f>j can be detected reliably if it is of larger order than n _1//4 , but 
not otherwise. 

Reflecting these results, it is not difficult to show that no method is ca- 
pable of reliably distinguishing between the case where no bump is present 
(i.e., 5 = 0), and that where a bump of size cn _1//4 , for arbitrarily small but 
positive |c|, has been added. To better appreciate this point, let us assume 
that the process X is Gaussian, or equivalently, that the random variables 
in (2.7) are independent and normally distributed, with zero means and re- 
spective variances satisfying J2k ®k < 00 • Fix a sequence 6\ > 62 > ■ • ■ > 
with this property, and fix also the eigenvectors ipi , tp2, ■ ■ ■ j choosing ipj and 
V>p as above. More generally, adopt the earlier construction of local pertur- 
bations, where a single parameter 5 is involved. Consider a set of just two 
distributions D of Y, the first, Dq say, corresponding to 5 = 0, and the sec- 
ond, D c , depending on c / 0, corresponding to 5 = cn" 1 ^. For a given value 
of a £ (0, 1), let T a denote the class of all decision rules T = T(n) that are 
measurable functions of the data y = {Yi, . . . ,Y n }, which can be used to 
classify the distribution of Y as either Dq or D c , and which satisfy 

Pd {T classifies D as Dq) > a. 

Then, 

(2.10) limsuplimsup sup Pd c (T classifies D as D c ) < 1 — a. 

c^0 n-*oo TeT a 

This result may be paraphrazed by saying that "no test that is capable of 
accurately detecting Dq, can also accurately detect D c for arbitrarily small 
values of |c|." 

3. Bootstrap-based assessment of extrema of empirical principal compo- 
nent functions. We begin by describing the m-out-of-n bootstrap in the 
context of estimating principal component functions. Let m < n, and condi- 
tional on X = {Xi, . . . , X n }, draw a bootstrap resample, X* = {Xf, . . . , X^}, 
by sampling randomly, with replacement, from X. Define K* analogously to 
K, at (2.2), and construct its spectral expansion, 

-1 m 

K*(u,v) = -J2{X*(u)-X*(u)}{X*(v)-X*(v)} 

3=1 
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Of course, ip*j is the bootstrap version of ipj. We shall use the stochastic 
variability of extrema of tpj to measure the relative "strengths" of extrema 

Of Ipj. 

The key to our suggested method is the following property. If m is chosen 
large, but small relative to n [i.e., in asymptotic terms, if m = m(n) — ► oo 
but m/n — > 0], then the numbers of modes of ip^, for 1 < k < j, accurately 
reflect those of ipk when the latter are viewed in an unconditional sense. 
That is, with high probability a "true" extremum in ip^ will produce exactly 
one extremum in tpt; but a shoulder point in ipk will (in asymptotic terms) 
produce either no extrema in ip^, or exactly two extrema there, each of these 
outcomes occurring approximately 50% of the time. Therefore, the relative 
frequencies with which bumps appear at different places in ip^ give a good 
guide to the "likelihoods" that real bumps are present in ipk. 

One can construct an informal definition of the likelihood attached to 
an extremum being in a particular region, as follows. Taking the estimator 
of ipk as a guide, first determine a subinterval, J say, of X where a single 
extremum might lie; replicate the function ip^ and count the proportion, p, 
of times (for a given dataset X) that ipt has at least one extremum in J\ 
and take 

(3.1) 7r = max{0, 2(p - -)} 

to be a measure of the likelihood that there is an extremum of ipk in 3 ' ■ 
Theorem 3.1 asserts that if there is just one point in the interior of J7, and 
no point on the boundary of J where ip' k vanishes, then tt — > or 1 according 
as that point is a shoulder point, or a proper extremum, respectively. 

However, the results are rather different if m = n, or more generally if m/n 
does not converge to zero. There, while it remains true that (with probability 
close to 1) a "true" extremum in ipk will produce exactly one extremum in 
ip k , it does not hold that a shoulder point in ipk will produce extrema in ip k 
approximately half the time. Indeed, if m/n does not converge to zero then 
the proportion of times, p, that (in the vicinity of a shoulder point of ipk) 
there is at least one extremum of ipl, does not converge in probability to 
a limit. In this sense, the standard, n-out-of-ra bootstrap is not consistent. 
Nevertheless, p does converge in distribution, to a random variable supported 
on [0,1] and distributed symmetrically about |. Intuitively, the reason for 
this property is that if m/n does not converge to zero then the difference 
between ip^ and ipf., which is of size m -1 / 2 , is not of strictly larger order than 
the difference between ip^ and ip^, which is of size n~ 1 / 2 . As a result, the 
latter difference plays a significant role in determining asymptotic properties 
of the conditional distribution of extrema of ip k . 

In the theorem below, given an interval J C T, and keeping the sample 
X fixed, let p^(J') denote the proportion of the resamples X* for which 
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there are exactly v solutions, in J , of the equation (V^)' — 0. Let e > be 
as defined in the paragraph immediately prior to Theorem 2.1, and recall 
that A^i denotes the constant appearing in (2.4). 

Theorem 3.1. Assume the conditions of Theorem 2.1, and also that n > 
m = m(n) > n c , for some c G (0, 1), and that l<k<j. Assume initially that 
m/n — > 0. Then: (a) If J denotes either [u^ + e, Uk/+i —e], for 1 < k < j and 

1 < I < qk - 1, or [a, - e] or [ufc,g fc - e,b], for 1 < fc < j , then p^ (J) -> 1 
in probability as n — > oo. (b) If J denotes an e -neighborhood of a point 

Uke for which rki is even, then p^\j) — > 1 in probability, and in fact the 
proportion of the solutions of fyt)' = that give a local maximum converges 
to 1, or to 0, according as A^t < or A^i > 0, respectively, (c) If J denotes 

an e -neighborhood of a point u^e for which r^ is odd, then p^\j) in 
probability for v = 0, 2. Next assume that m/n — > p, where < p < 1. Then 
(a) and (b) akwe continue to hold, but (c) should be changed to: (c)' If J 
denotes an e -neighborhood of a point u^£ for which is odd, then 

(pf\j),£\j)) - (Hp 1/2 n), i - $0^)) 

in distribution, where $ denotes the standard normal distribution function 
and the random variable N has the N(0, 1) distribution. 

Of course, if p = 1 then $(/) 1//2 iV) has the uniform distribution on [0, 1]. 

Set up the local perturbation problem as in Section 2.2. In particular, 
ipj denotes a function with a shoulder at uq (the midpoint of I), and tp p 
is a concave bump, of which a small multiple, 5 = S(n), is added to ipj to 
form the locally perturbed version, <f>j, of tpj] see (2.6). As in Section 2, 
the principal component function cftj is estimated by <f)j, obtained from the 
spectral decomposition (2.9) of L, the latter defined at (2.8) in terms of the 
dataset 3^ = {Y\, . . . , Y n }. Let L* denote the bootstrap version of L, obtained 
by computing L not from the original dataset y but from a bootstrap re- 
sample 3^* = {X*i ■ ■ ■ > ^} drawn by sampling randomly with replacement 
from y. Assume that m = m(n) > n c for some c G (0, 1), and that m/n — > 
as n — > oo. Let J denote an e-neighborhood of uq, where e is any positive 
number not greater than half the length of T; and define Pj to equal 

the proportion of times, conditional on y, that the equation (<f>j)' = has 
exactly v solutions in the interval J . Assume the conditions of Theorem 2.1. 
Then the following analogue of a portion of Theorem 2.2 holds: 

If 5 = 5{n) satisfies m}^5 — ► oo then p~p {J) — > 1 in probability, 
(3 2) whereas if 5{n) = o(m -1 / 4 ) then p"p{J) — > 5 in probability for 
v = 0,2. For either choice of 5, the probability that (j)* has a 
horizontal point of inflection in I equals 0. 
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Expressed another way, (3.2) states that if m 1 / 4 ^ — > oo then the "boot- 
strap likelihood" it, at (3.1), will converge to 1 when applied to the interval 
J and to the jth principal component function. That is, the bootstrap like- 
lihood will correctly signal that a bump, in the form of the function ip p , has 
been added to the shoulder at uq in ipj. On the other hand, if 6(n) = o(m~ 1 / 4 ) 
then the bootstrap likelihood will converge to zero, indicating that the added 
bump has been missed. Therefore, the bump may not be detected if m is 
too small. 

In some respects it might be satisfying to have a purely empirical rule for 
choosing m. However, the results above argue that this is neither practical 
nor, in the main, actually desirable. We know from (2.10) that no empirical 
rule can distinguish the case where there is no added bump, from that where 
a bump of size n _1//4 is added; and the results above show that a bootstrap- 
based test can, in asymptotic terms, distinguish a bump of any order, 5{n) 
say, that is strictly greater than n _1//4 . Indeed, we should choose m so that 
m/n — > but 5 4 m — ► oo. However, in order to achieve this level of sensitivity 
in a purely empirical way, without an external source of information about 
bump size, we need to do in advance essentially that which are trying to do 
now — we need to use empirical evidence to approximate 5 so we can choose 
m in order to determine empirically whether a bump, of size 5, is present. 

The circularity of this argument, and the fact that [in view of (2.10)] it 
is virtually impossible to accurately estimate 5 when the bumps are small 
but barely detectable, means that empirical choice of m is not a practical 
option. Instead, the problem should be addressed from an exploratory angle, 
for example, starting with m = n and gradually decreasing this quantity. 
We expect that as m decreases the stochastic fluctuations inherent in the 
bootstrap will play an increasing role, generally giving rise to more extrema 
in the functions I n consequence, if an extremum is not genuine, it will 
tend to increase with decreasing m. 

4. Numerical properties. 

4.1. Application to real data on gait cycle. This example concerns child 
gait data, studied by Olshen, Biden, Wyatt and Sutherland [31] and Rice 
and Silverman [39]. These data consist of records of the angle of the knee 
and the hip during a gait cycle, recorded at 20 equally spaced time points, 
for 39 children aged approximately five years. We shall focus on the hip data. 

Figure 1 illustrates the first and second empirical principal component 
functions for one cycle. (To provide a good view of the extrema we repre- 
sent the data in the interval [—4, 15].) These two eigenfunctions correspond 
to 82% of total variability, and can be interpreted as follows: The first eigen- 
function represents an overall shift with respect to the mean curve, and the 
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Table 1 

Values of it for each extremum of the first two principal component functions 
(PCFs) for the hip-movement component of the child gait data, obtained using 500 
bootstrap iterations and different bootstrap sample sizes m. The interval J for each 
extremum Uki was [iiki — £,Uki + e] 



1st empirical PCF 2nd empirical PCF 



Extrema 


-1.5 


0.5 


2.5 


10.5 


0.5 


2.5 


4.5 


12.5 


e 


1 


1 


1 


4 


1 


1 


1 


4 


m = 10 


0.764 


0.144 


0.008 


0.976 


0.476 








0.82 


m = 15 


0.784 


0.244 


0.108 


0.972 


0.528 








0.78 


m = 20 


0.808 


0.432 


0.084 


0.972 


0.628 


0.028 





0.808 


m = 25 


0.804 


0.428 


0.136 


0.992 


0.66 


0.06 





0.796 


m = 30 


0.82 


0.484 


0.152 


0.984 


0.72 


0.14 





0.796 


m = 35 


0.828 


0.492 


0.18 


0.988 


0.74 


0.156 





0.832 


m = 39 


0.848 


0.62 


0.196 


0.992 


0.764 


0.164 





0.828 



second corresponds to an increased angle for the first part of the cycle (in 
the interval [0,8]), and a delay in the second part. 

The first and second empirical principal component functions each have 
four extrema, at —1.5, 0.5, 2.5 and 10.5, and 0.5, 2.5, 4.5 and 12.5, respec- 
tively. Table 1 summarizes the values of tt obtained using 500 bootstrap iter- 
ations, and bootstrap sample size m varying among 10,15,20,25,30,35,39. 
The value of e equals half the length of the interval J . 

The tabulated results suggest that the points —0.5 and 11.5, and 1.5 
and 13.5, correspond to genuine extrema for the first and second empirical 
principal component functions, respectively. The other extrema are spurious. 



0.5 
.S 0.4 




£ -0.4 



5 10 15 

time during giiit cycle 3 

Fig. 1. Graphs of the first two empirical principal component functions for the 
hip-movement portion of the child gait data. Solid and dashed lines show the first and 
second empirical principal component functions, respectively. 
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In each instance the largest value of fr among all of the spurious extrema is 
less than the smallest value of n among all the genuine extrema. 

This information is helpful in choosing the level of smoothing when graph- 
ically presenting the principal component curves — the level of smoothing 
should be sufficiently great to remove the spurious extrema. That is readily 
achieved, and produces eigenfunction and covariance estimates that reflect 
the knowledge acquired above. 

4.2. Application to simulated data. For most of the examples discussed 
below, synthetic data, representing random functions on the interval I = 
[0, 1], were generated as follows. The first principal component function was 
taken to belong to the family 

(4.1) ipi(x) =ip\\{x) = c A (2Ax 3 -3Ax 2 + 1.5a;), -oo<A<oo, 

where c\ > was chosen to ensure that \ x ipf = 1 . If A < 1 then V'ia nas no 
extrema or shoulder points on T; if A = 1, ipw has no extrema and just one 
shoulder point; if A > 1, t/>ia has just two extrema, at the points [A ± {A(A — 
l)} 1//2 ]/(2A), and no shoulder point. 

This "evolution" of ip\\ takes place smoothly as A is increased, as indi- 
cated by the graphs in panel (a) of Figure 2. The other principal component 
functions, ipk — ^fcA for k>2, were constructed by orthonormalization from 
the functions {ip\{x) :sin[2-7r(A; — l)x], k > 2}. In our numerical work we used 
a discrete orthonormalization procedure based on 250 equally-spaced points 
in X. Then, to build the random function X , we multiplied each ipk\ by the 
square roots of the eigenvalues 9k = k~ 2 , for 1 < k < 5 (we set 6k = for 
k > 6), and also by the kth member of an uncorrelated Gaussian random 

sequence 771,772, .. . with zero means and unit variances. In this way we con- 

1/2 

structed X = J^k^k^kx [cf. (2.8)], where = 9 k In particular, X is a 
Gaussian process with zero mean. 

For this process, numerical results illustrating Theorem 2.1 are readily 
obtained. They show that the shapes of estimated principal component func- 
tions converge to those of true principal component functions, as measured 
by criteria such as number of extrema and the locations of those points. 
However, for the sake of brevity we shall not give details of those results. 

Panel (b) of Figure 2 graphs, for one value of m, the probability density of 
7T for the principal component functions ipi shown in panel (a), respectively. 
In the case of the solid lines in the figure, the function ipi has no extrema 
and, as predicted by Theorem 3.1, the distribution of n there is concentrated 
at relatively low values, that is, toward zero. Considering the dotted lines 
of the figure, the function ipi can be deduced to have two extrema, one a 
minimum and one a maximum, and, again as suggested by Theorem 3.1, the 
distribution of fr is concentrated toward the upper end of the unit interval. 
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Fig. 2. Illustration of Theorem 3.1 (i) . In panel (a) the function ipi(x) = ipi\(x), defined 
at (4.1), is graphed against x for X — 0.85, 1 and 1.2. Panel (b) shows plots of the prob- 
ability density of it, the latter defined at (3.1), computed for the first empirical principal 
component function in the cases of the respective values of X. We chose also J '= [0.4,0.6] 
(resp. J — [0.6,0.8]) when X = 0.85 or X = 1 (resp., X = 1.2). The process X was Gaussian, 
and was constructed as described in the first two paragraphs of this section. The plots in 
panel (b) are for n = 300, when m is the integer part of 2 n 0,6 . 



The dashed lines in Figure 2 show an intermediate case, where tpi has a 
shoulder point and the distribution of fr is concentrated more toward the 
centre of the unit interval. Here, and in all the numerical work in this section, 
sample size was n = 300, all results represent averages over 200 synthetic 
samples of that size, and we used 300 bootstrap iterations for computing fr. 

The graphs of the density of fr in Figure 2 were constructed using kernel 
methods, in which the kernel was taken to be the standard normal density, 
and the bandwidth was chosen equal to 0.02, a value suggested by cross- 
validation. In panel (b) the subinterval J was centred at 0.5 for the cases 
A = 0.85 and 1. When A = 1.2 it was centred at 0.70, close to the local 
minimum of ift in X. In each instance, J was of length 0.2. 

Results very similar to those in Figure 2 were obtained in the non- 
Gaussian case where the variables % used to construct X, four paragraphs 
above, were taken to have the distribution of a centred value of the absolute 
value of a standard normal random variable, instead of being standard nor- 
mal themselves. The probability densities of fr in the three cases (A = 0.85, 
1 and 1.2) were almost as well separated as before. 

Next we return to the Gaussian case, but alter the definition of X by tak- 
ing ipi(x) = ip(x) = (x — 0.8) 3 (x — 0.2) 3 , all other aspects of the construction 
remaining the same. The function ijj is illustrated in the left-hand panel of 
Figure 3; it has one local minimum, at 0.5, and two shoulder points, at 0.2 
and 0.8. The right-hand panel of the figure shows graphs of the density of fr. 
For each of those graphs, J is of width 0.2. The density of fr has much of its 
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x values at i 



(a) <b) 

Fig. 3. Illustration of Theorem 3.1 (ii) . Panel (a) graphs the function 
tpi(x) = (x — 0.8) 3 (x — 0.2) 3 , being the first principal component function of a new 
Gaussian process X. Panel (b) shows three curves, representing the densities of tt , 
computed for the first empirical principal component function in different settings. Sample 
size was n = 300 and m is the integer part of 2 n ' 6 . 



mass toward when J straddles a shoulder point, but is concentrated close 
to 1 when J is centred at the local minimum, reflecting the claims made in 
Theorem 3.1. 

Next we construct the process X truncated to 15, rather than 5, nonnull 
eigenvalues. That is, we take Xk = k~ 2 for k = 1, ... ,15, and = for k > 
16. In this setting, panel (a) of Figure 4 graphs the proportion of variability 
for each eigenvalue, and suggests that the fifth principal component function 
is the last one that is likely to be of empirical interest. Panel (b) of Figure 4 
graphs the probability density of tt when -05 contains either one shoulder 
point or two extrema. The two curves were obtained using kernel methods 
with bandwidths 0.02 and 0.01, respectively, and the interval J was chosen 
of length 0.05 and centred at 0.5 for the shoulder point, and at 0.7 in the 
case of the extrema. As expected, the function represented by the solid line 
is concentrated close to the value 1, but the function represented by the 
dashed line is supported on a longer interval. This reflects the fact that, in 
the case of a shoulder point, the empirical evidence for an extremum is very 
weak. 

Next we summarize simulations addressing the local perturbation theory 
in Section 2.2. We shall use the notation of that section. To address the 
result (2.10) we constructed the functions <j>j and 4> p , and the process Y, as 
explained in Section 2.2, with j = 1, p = 2, 

ipj(x) = 8V7(x - 0.5) 3 , tp p (x) = -iV5(x - 0.5) 2 . 

In particular, Jjip] = Jj^p = 1 an d JjV'jV'p = 0- The random sequence 
used for the construction of the process Y, is defined exactly as described 
in the second paragraph of this section. 
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5 10 1S 0,2 0.4 0.6 O.S 1 



number of eigenvalues values of rr 

(a) (b) 

Fig. 4. Illustration of Theorem 3.1 (iii) - Panel (a) graphs the amount of total variation 
accounted for by each eigenvalue. Panel (b) shows two curves, representing the densities 
of n, computed for the fifth empirical principal component function where it possesses 
a shoulder point (when X = 1), or an extremum (when X — 1.2). The interval J was 
[0.45,0.55] in the case of a shoulder point, and [0.65,0.75] in the case of an extremum. 



Figure 5 illustrates the three cases arising in Theorem 2.2. The results 
there lend support to the theoretical results in Section 2.2. Indeed, conver- 
gence toward 1, or or to a value strictly between and 1, is illustrated 
by the solid, the dashed and the dashed-dotted curves, respectively. In the 
last of these three curves the value is close to 0.6 rather than ^. 

5. Theoretical arguments. 

5.1. Proof of Theorem 2.1. The following result, which states large-deviation 
properties in the case of random functions, may be derived using relatively 



i 

■g 0,9 

1 o,a 

v 

* 0.7 

I 



0.4| 

100 



<1 = n " 1 [fHtliiUhu'] 

.) = r, • » [dwdwd-dotwdllnr] 
6 = n~ ,s * [(.lushed line] 



500 



1000 1500 2000 

size: ii 



Fig. 5. Illustration of Theorem 2.2. The three curves represent the probabilities that 
<j)j has two extrema, and no extrema, respectively. The cases 5 = n -01 , S — n -0 ' 25 and 
S = n -0 ' 8 represent instances where n 1//4 5 — » oo, n 1,/4 <5 is neither particularly large nor 
particularly small, and n 1 ^ 4 ^ — » 0, respectively. 
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standard arguments. We shall therefore give only an outline proof. Here and 
below, C,C±,C2, ■ ■ ■ denote constants that may be taken arbitrarily large. 
We use 1/C, 1/Ci , I/C2 , . . . to indicate constants whose values can be taken 
arbitrarily small. Note too that, in results such as (5.1) below, if the result 
can be proved for any given value of C (say, C = Co), then it also holds for 
all C < Cq. This means that we could equivalently state (5.1) with 1/C and 
C, on the left- and right-hand sides, respectively, replaced by e, and assert 
that (5.1) holds for all C,e > 0. 

Lemma 5.1. Assume the eigenvalues 9\ , . . . , 9j+i are distinct, and that 
with probability 1, X has r > Holder- continuous derivatives on X , with 
Holder exponent 77 > 0. If C > is given, and if E\X( 3 ' (a)\ D < 00 for s = 
0, . . . , r and ^y r (D,rj) < 00 for a sufficiently large value of D = D{C) > 0, 
then 

(5.1) p(sup|$ r) (u)-4 r) (u)| >n (1/c)_(1/2) for 1 < k < j}= 0(n~ c ). 

Furthermore, if uq Gl and rf G (0, 77) ? and if j r (D,rj) < 00 for sufficiently 
large D = D(C, rf) > 0, then 



(suplvpW - vfVo) - {^ r) («) - 4 r) K)}| 

> nCVCO-Ci/aJju _ Uq \v' f or l<k<j\ = 0(n~ c ). 



P 

(5.2) 



Outline proof of Lemma 5.1. If L is a function on 2 2 , put |||L||| 2 
J I2 L 2 , \\L(u,-)\\ 2 = JjL(u,v) 2 dv, ||L||su P = sup ugI ||L(u, -)|| and (u, v) 
(d/du) r L(u,v). It may be proved that 



(5.3) = J K (r) {u,v){$ j (v)-ip j (v)}dv 

+ JjK^{u,v) - K^(u,v)}^{v)dv - {9 3 - 9j)^ r) (u), 

which entails 

max(0,% - \9j - ^-|)|^ r) (u) - ^\u)\ 

(5.4) <||A'W(n,-)||||^-^|| 

+ \\K^{u,-)-K^\u,-)\\ + \e j -e j \\^\u)\. 

Put 5j = inffc<j(0fc — 9k+i) and A = \\\K — K\\\. It may be proved from 
results of Bhatia, Davis and Mcintosh [2] that sup,->i|0j — 9j\ < A, and 
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that if A < T}5j then W'ipj — < AA/Sj. On the other hand, if A > then 

4A/5j > 2 > W^j-ipjW, since \\tpj\\ = ||^'|| = 1. Therefore, W'ipj-ipjW < 4A/5j 
in all cases. When combined with (5.4), these results imply that 

max^-A^l^f-V^IL 

(5.5) 

< A{4||^)|| sup 57 1 + l^flL} + \\K^ - K^\\ sup . 

The assumptions in Lemma 5.1 imply that 0j,6j > and ||if( r ) || sup , 

IIV'j-^lloo < oo. We shall show that if j r (D,rj) < oo for sufficiently large 
D = D(C) > 0, then 

(5.6) P{\\K^ - K^\\ sup > n (VC0-(i/2)} = (n- c ). 

A similar but simpler argument shows that the same bound applies if — 
^"^llsup is replaced by A. These results and (5.5) imply the bound (5.1) in 
the case k = j. Other values of k may be treated similarly. 
Note that K = Ai — A2, where 

A 1 (u,v) = - f^iMu) ~ v{u)}{X t {v) - n(v)}, 
n f— ; 

A 2 (u,v) = {X(u) - n(u)}{X(v) - fi(v)}. 
Therefore it suffices to prove the version of (5.6) that arises if we replace 

Kir) _ R (r) there by either A W _ A (r) _ R (r) Qr We ghall treat Qnly 

the first of these cases; the second is simpler. That is, we shall prove that 

(5.7) p(sup||A( r) (n,-)||>n( 1 / c )-( 1 / 2 )l = 0(n- c ). 

First we derive the version of this result when the supremum is taken 
only over u G Z(Ci), denoting the set of n Cl equally-spaced points in T, 

where C\ > can be arbitrarily large. See (5.9) below. Noting that A^\u,v) 
equals the mean of a sum of independent and identically distributed random 
variables, and using Rosenthal's and Markov's inequalities, we may prove 
that if C2, C3 > are given, and if 

(5.8) sup£{|xM(u)| D }<oo, 
then 

supi?{||n 1 / 2 A( r) (n,.)|| D }<oo. 

Therefore, by Markov's inequality, 

supP{||A( r) (u, -)|| > „(VC a )-(i/2) } = (n- D ^). 



20 P. HALL AND C. VIAL 

It follows that if C2, C3 > are given, and if D = D(C2, C3) > is sufficiently 
large, then 

supP{||A| r \u, OH > n (VC 2 )-(i/2)} = ( n -C 3 ) 

Hence, if C± > is any other constant, and if (5.8) holds for sufficiently large 
D = D(C 2 , C 3 , Ci) >0, then 

(5.9) p( sup HA^V,-)!! >n( 1 / C2 )-( 1 / 2 ))=0(n-^). 

If C4 > is given, if C\ > is sufficiently large, and if j r (D,r]) < 00 for 
sufficiently large D, then 

£{sup||A 3 r) (?v) - A 3 r V,.)|| j =0(n"^), 

where «' denotes the nearest point in T{C\) to u £ I. From this bound 
and Markov's inequality, it follows that if 62,63 > are given, and C\ = 
Ci(C2,C 3 ) > is chosen sufficiently large, 

(5.10) p(sup||A 3 r) (u,-)-A3 r V,-)|| >n^ c ^ l ^\ = 0{n- c ^). 

The desired result (5.7) follows from (5.9) and (5.10), on taking C2 = C3 > C. 
To derive (5.2) we start from (5.3), which implies that 

- ^ r \u ) - i4 ] ( u ) - ^ r Vo)}] 

= J {K& (u, v) - (u ,v)}{^ (v) - ^ («)} dv 

(5.11) + J [K {r) {u,v)-K^{u,v) 

-{K^{u Q ,v)-K^{u G ,v)}]^{v)dv 

-{e 3 -e 3 ){^\u)-^\u,)}. 

(r) 

The conditions of the lemma imply that tpj is Holder-continuous with 
any exponent rf < r). This property, (5.11) and the arguments leading to 
Lemma 5.1 then give (5.2). □ 

Proof of Theorem 2.1. For given r\ > 0, statements made below hold 
provided 72(^,77) < 00 for sufficiently large D > 0. We shall, however, omit 
that qualification. 

Since the function is bounded away from zero uniformly on each of 
the intervals mentioned in part (a) of the theorem, then part (a) follows 
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directly from (5.1). Let M k i denote the e-neighborhood of u k e, mentioned 
in parts (b) and (c) of the theorem. Using first (5.2) and then (2.4) it may 
be proved that, for r = 1,2, 

^V) = + 4 ] («) - VfW) + O p {nW-W\u - u M f) 

= ( u kt) ~ ^k\ u kt) 

(5.12) 

+ (d/du) r A ke (u - u u )™ + 0(\u - u w | r "+"- p ) 

+ oM l/c) - [m \u-u k ^), 

uniformly in u £ M k e- Therefore, if S k £ denotes the set of all solutions, 
u = Uki, of the equation tp k (u) = in M. k £, then, taking r = 1 in (5.12) and 
noting that tjj' k (u k £) = 0, we deduce first that for each 5\ > 0, with probability 
converging to 1 as n— > oo, all elements of S k e he within n~^- 1 ^ 2 ^ TkC ~ 1 ^ +Sl 
of Ukf, and then, that for some 62 > 0, 

(5.13) u M - UM = {-{A M r M y l ^(u M )} l l^-i) +0p (n^ 

uniformly in u ke £ S M . 

Substituting the expansion at (5.13) into the version of (5.12) for r = 2 

" ^(2) (2) 

and u = Uke, and writing (j>ki for ip k (u k e) — ip k (u k i), we deduce that for 

some 83 > 0, 

$ k \u ke ) = <j> M + r ke (r ke - l)A ki {u ki - « M ) r "" 2 

+ O p {\u u ~ u M \ r ^~ 2 + n^ ^ 1 ^ \u M - u M f) 

(5.14) 

= 4>kl + r H {ru ~ ^^^{-(^Hrfc^-VU^)}^-^ 7 ^-^ 

uniformly in u k £ S S k £, where (here and below) we interpret x^ Tkt ~ 2 ^^ rke ~ 1 ^ 
as the (r k £ — 2)nd power of x 1 ^ Tkt ~ 1 ^; it is undefined if r k £ is odd and x < 0. 
Of course, when r k £ = 2 we interpret x yTkl ~ 2 ^l^ rkl -1 ) as 1. 

Next we derive the limiting distribution of (i^' k (u k i) , 4> k i) . Observe from 
(5.3) that 

wi r W) - vi r W)} 

= J K^(u ke , v){Mv) - Mv)} dv 

+ J{K^ r \u u ,v) - K^(u ke , v)}Mv) dv - (9 k - 6 k )^ k r \u kt ). 
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(We shall need only r = 1,2.) Using arguments of Dauxois, Pousse and 
Romain [12], it may be proved that 

9 k -9 k = U k + o p (n" 1 / 2 ), 
J{K^(u M , v) - K^(u M ,v)}$ k (v) dv = V$ + o p (n" 1 /2) i 

V 

K^' \u k( ,,v){i> k {v) - ip k (v)}dv =^29 p W kp ^p ) (u ke ) +n~ 1/2 R ke i(r, u), 

P =i 

where 



U k = J J {K(u,v) - K(u,v)}^ k (u)ip k (v) dudv, 
V kt = J x {K (r) (u ke ,v) - K( r Hu ke ,v)}Mv) dv, 

W kp = (9 k - 9 p )~ l J 'J {K(u,v) - K(u,v)}ifj k (u)ip p (v)dudv 
and, for each ( > and for m = 1, 

(5.15) lim limsupP{|i4 £m (r,i/)| > (} = 0. 

The results in the previous paragraph imply that 
9k{$ k r \uki) -ipk ( u m)} 

V 

= £ W*4 r) («M) + V m ~ U^SM + n~ l > 2 R m (r, v) 



p=i 



1 n 

n f ~ 



i=i 



k£3{r, y) 



where 



Bi(r,u) = X>pV#' ) " Qp)~ l 
p=i 



x y J {Xi(u)Xi(v) - EXi(u)Xi(v)}i) k (u)i) p (v)dudv 
+ J{xi r \ukt)Xi(v) - EXl r \u M )Xi(v)}Mv) dv 
-VfW) J J {Xi(u)Xi(v) - EXi(u)Xi(v)}Mu)Mv)dudv 
+ n- 1 ' 2 R u (r,y) 
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and Rkt2{r-,v) and Rhl2(r,v) satisfy (5.15). [In the definition of Bi(r,u) we 
have, in a slight abuse of notation, used Xj for Xi — E(Xi); there is no 
loss of generality in assuming that E{Xi) = 0.] Taking r = 1, in which case 

4>t \ u u) = and r = 2, for which 4> M = ^'{uu) ~ Vk \ u kl), we deduce 
that 

1 n 

(5.16) k $'k(uke),$kt) = ~ Y,(Bi(l, v),Bi(l, v)) + n^Ruiy), 

n f— , 

i=i 

where 

(5.17) lim IimsupP{|12 fc/ (i/)| > C} = 0, 

v >oo n — >co 

and on this occasion | • | denotes the Euclidean norm. 

The random two-vectors (£$(1, u), -Bj(2, v)) are independent and iden- 
tically distributed with zero means and finite variances. Therefore con- 
ventional arguments show that n -1 / 2 J2i(Bi(l, v), Bi{2, v)), and hence also 
n 1/,2 (^(MH),fe) has an asymptotic bivariate normal distribution with fi- 
nite variance. 

Return to (5.14); multiply throughout by n^ ke ~ 2 ^^ 2( - Tkl ~ 1 ^ ; substitute 

£^(1,1/), .$(2, i/)) 
fc i=i 

for (t/j' k (uk£) , 4>ki) when a factor of the latter appears on the right-hand side 
of (5.14); and note (5.16) and (5.17), obtaining 

n (r«-2)/{2(r M -l)}^[2) (fiw) 



n -ru 



/V(r kl -i)} 9 -iT£ B .(2,v) 

i=l 

+ r M {r M - l)A ki ^-(A k zr k i)~ 1 -^Y^ B i( 1 > 1 ') j 

+ Op(l), 

uniformly in S S k £- The series J^j£i(2, equals O p (n 1//2 ), which, when 
multiplied by n~ rke ^ 2 ^ Vke ~ 1 ^ , equals o p (l). Therefore, 

n (r M -2)/{2(r«-l)}^(2) ( ^ ) 

= rki(r k £-l)A k A-(A M rk£)~ 1 ^^^2B i (l,u) j +o p (l) 
= r w (r w - l)^jV fc ,( n )(-«-2)/(r«-i) + 
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uniformly in Uki E Ski, where the random variable Nki(n) converges in dis- 
tribution to a random variable Nki with the N(0, crig) distribution. 

When rke > 2 is even, the sign of r^(r^ — l)A) e iNj^ kt 2 ''^ Tke ^ is always 
the same as the sign of Aj-g. [Recall the interpretation, just below (5.14), of 
x (r fc f-2)/(r M -i) j Therefore, when r^e > 2, all the solutions of ip' k = that lie 
in the vicinity of give, with probability converging to 1 as n — > oo, local 
maxima if Am < or local minima if A^g > 0. It follows that, if > 2 is 
even, with probability converging to 1 as n — > oo, ipk has just one extremum 
in the neighborhood of Uf-i, with the same parity as the extremum of ipk at 
Uk£- This proves part (b) of the theorem. 

On the other hand, when r^e > 3 is odd, the random variable nH e is 
well-defined if and only if Nki > 0, in which case it takes either of two values, 
that is plus or minus the (rki — l)st root of Nki- Correspondingly, the prob- 
ability that Ski is empty converges to P{Nki < 0) = |, and the probability 
that Ski contains just two elements converges to P{Nki > 0). The signs of 
N^ l ~ 2 ^^ TM ~ 1 ^ that correspond to the two values of Nl^ Tke ~^ are positive 
and negative, respectively. Correspondingly, the signs of Nki{n)^ rkl ~ 2 ^ ^ Tkl ~ 1 ^ 
are positive and negative, and so when Ski contains two elements, one of 
them must give a local maximum and the other a local minimum. This 
proves parts (c) and (d) of the theorem. 

When rki is even, that is when Ukg gives a local extremum of ipk, the first 
term on the right-hand side of (5.13) is always well defined, since ^VC^f- 1 ) 
is well defined for all real x. In particular, part (e) of Theorem 2.1 follows di- 
rectly from (5.13), since n l / 2 il)' k (uki) is asymptotically normally distributed. 

When rki is odd, that is when Uki gives a point of inflection of tpk, 
the first term on the right-hand side of (5.13) is well defined only when 
— (Akerke)~ 1 ip'k( u ke) ^ 0. The results discussed two paragraphs above now 
imply that, conditional on ip having two extrema given generically by Uf-e, 
they satisfy 

n l/{2(r«-l)} ( a M _ Um) = JV w ( n )V(r«-l) + 0p(1)) 

where Nki is conditioned to be positive and the two solutions correspond to 
the positive and negative roots on the right-hand side. This gives part (f ) 
of Theorem 2.1. □ 

5.2. Proof of Theorem 3.1. The proof closely parallels that of Theo- 
rem 2.1, being based on the following bootstrap version of Lemma 5.1. 

Lemma 5.2. Assume the conditions of Lemma 5.1. If C > is given, 
and ifj r (D, rj) < oo for a sufficiently large value of D = D(C) > 0, then there 
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exists a setlZ of realisations of X , with probability P{1Z) = 1 — 0{n~ c ), such 
that whenever X G 1Z, 



X \ < const. m C , 



p(sup|(^)W(n) - $'\u)\ > m (VC)-(l/2) f or l<k<j 

where, here and below, "const." is nonrandom. Furthermore, if uq&I and 
rj' G (0,7/), and if j r (D,r)) < oo for sufficiently large D = D(C,rj') > 0, then 
whenever X G TZ, 

pLu P mf\u) - (r k f\u ) - {$\u)-$\uv)}\ 

> m ( 1 /C)-(l/2)i u _ Uo |7?' forl<k<j\x \ < const. m~° . 



Let Mm be as in the proof of Theorem 2.1, and write S ki for the set 
of all solutions, u = u* kl , of the equation {^>%)'{u) = in M. k i- Parallelling 
arguments in Section 5.1 we may prove the following analogues of (5.13) 
and (5.14): for some 5 > 0, 

u* M - u u = [{A M r M )- l {m\u M ) - ^(^)}] 1/(r «" 1) 

(5 ' 18) +m-[W«-l)}] J R*(^) ) 

m {2) (u* ke ) = r ke (r ke -l)A ke 
(5.19) x [{AMrur^^D'iuu) - ^' k {u u )}] {ru ~ 2)/{rM ~ X) 

+ m -[(^-2)/{2(^-l)}] jR *(^ ); 

where, on a set 1Z{C\) of realizations of X that satisfies P{TZ) = 1 — 0(n~ c ), 



PI sup \R* 2 {u* M )\ >n~ s \x\ <n 



C 



for all ^ G ft. 
Write 

(i>kY( u ke) - Vi(«/w) = $k)'( u ki) ~ i'kWkt) + V"fc(«fcf) - ^fc(^) 
in each of (5.18) and (5.19), and note that the variables 

Z = n^l^M - i>' k (u M )}, Z* = m^ 2 {(r k y(u ke ) - 

are each asymptotically normal distributed as N(0,r 2 ) say, with Z* having 
this weak limit conditional on X (and hence also conditional on Z). The- 
orem 3.1 follows from these properties, on doing little more than retracing 
the arguments leading to Theorem 2.1. 
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For example, to obtain the last part of (c)' in the case where r^i is odd, 
use Kolmogorov's extension theorem to write Z = Z\+ o p (l) and Z* = Z2 + 
o p (l), where Z\ and Z2 have exactly N(0,r 2 ) distributions, Z\ is measurable 
in the sigma-field generated by X, and Z2 is N(0,r 2 ) conditional on X. 
Assume, without loss of generality, that Ake > 0; the case A^e < may be 
treated similarly. Then, in view of (5.18), the probability, conditional on X, 
that is nonempty (and contains exactly one element) equals 

(5.20) P(p 1/2 Z 1 + Z 2 >0\Z 1 ) + o p (l), 

where p = lim(m/n). If p = then the quantity at (5.20) converges in proba- 
bility to i; and if < p < 1 then it equals ^(p 1 ^ 2 Z3) + o p (l) , where Z% = Z±/t 
and has a N(0, 1) distribution. 
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